38 research outputs found
Crawling in Rogue's dungeons with (partitioned) A3C
Rogue is a famous dungeon-crawling video-game of the 80ies, the ancestor of
its gender. Rogue-like games are known for the necessity to explore partially
observable and always different randomly-generated labyrinths, preventing any
form of level replay. As such, they serve as a very natural and challenging
task for reinforcement learning, requiring the acquisition of complex,
non-reactive behaviors involving memory and planning. In this article we show
how, exploiting a version of A3C partitioned on different situations, the agent
is able to reach the stairs and descend to the next level in 98% of cases.Comment: Accepted at the Fourth International Conference on Machine Learning,
Optimization, and Data Science (LOD 2018
Recommended from our members
Performance Enhancement of Deep Reinforcement Learning Networks using Feature Extraction
The combination of Deep Learning and Reinforcement Learning, termed Deep Reinforcement Learning Networks (DRLN), offers the possibility of using a Deep Learning Neural Network to produce an approximate Reinforcement Learning value table that allows extraction of features from neurons in the hidden layers of the network. This paper presents a two stage technique for training a DRLN on features extracted from a DRLN trained on a identical problem, via the implementation of the Q-Learning algorithm, using TensorFlow. The results show that the extraction of features from the hidden layers of the Deep Q-Network improves the learning process of the agent (4.58 times faster and better) and proves the existence of encoded information about the environment which can be used to select the best action. The research contributes preliminary work in an ongoing research project in modeling features extracted from DRLNs
Identifying Critical States by the Action-Based Variance of Expected Return
The balance of exploration and exploitation plays a crucial role in
accelerating reinforcement learning (RL). To deploy an RL agent in human
society, its explainability is also essential. However, basic RL approaches
have difficulties in deciding when to choose exploitation as well as in
extracting useful points for a brief explanation of its operation. One reason
for the difficulties is that these approaches treat all states the same way.
Here, we show that identifying critical states and treating them specially is
commonly beneficial to both problems. These critical states are the states at
which the action selection changes the potential of success and failure
substantially. We propose to identify the critical states using the variance in
the Q-function for the actions and to perform exploitation with high
probability on the identified states. These simple methods accelerate RL in a
grid world with cliffs and two baseline tasks of deep RL. Our results also
demonstrate that the identified critical states are intuitively interpretable
regarding the crucial nature of the action selection. Furthermore, our analysis
of the relationship between the timing of the identification of especially
critical states and the rapid progress of learning suggests there are a few
especially critical states that have important information for accelerating RL
rapidly.Comment: 12 pages, 6 figure
C-tests revisited: back and forth with complexity
The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-21365-1_28We explore the aggregation of tasks by weighting them using a difficulty
function that depends on the complexity of the (acceptable) policy for the task (instead
of a universal distribution over tasks or an adaptive test). The resulting aggregations
and decompositions are (now retrospectively) seen as the natural (and trivial) interactive
generalisation of the C-tests.This work has been partially supported by the EU (FEDER) and the Spanish MINECO under grants TIN 2010-21062-C02-02, PCIN-2013-037 and TIN 2013-45732-C4-1-P, and by Generalitat Valenciana PROMETEOII 2015/013.Hernández Orallo, J. (2015). C-tests revisited: back and forth with complexity. En Artificial General Intelligence 8th International Conference, AGI 2015, AGI 2015, Berlin, Germany, July 22-25, 2015, Proceedings. Springer International Publishing. 272-282. https://doi.org/10.1007/978-3-319-21365-1_28S272282Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research 47, 253–279 (2013)Hernández-Orallo, J.: Beyond the Turing Test. J. Logic, Language & Information 9(4), 447–466 (2000)Hernández-Orallo, J.: Computational measures of information gain and reinforcement in inference processes. AI Communications 13(1), 49–50 (2000)Hernández-Orallo, J.: On the computational measurement of intelligence factors. In: Meystel, A. (ed.) Performance metrics for intelligent systems workshop, pp. 1–8. National Institute of Standards and Technology, Gaithersburg (2000)Hernández-Orallo, J.: AI evaluation: past, present and future (2014). arXiv preprint arXiv:1408.6908Hernández-Orallo, J.: On environment difficulty and discriminating power. Autonomous Agents and Multi-Agent Systems, 1–53 (2014). http://dx.doi.org/10.1007/s10458-014-9257-1Hernández-Orallo, J., Dowe, D.L.: Measuring universal intelligence: Towards an anytime intelligence test. Artificial Intelligence 174(18), 1508–1539 (2010)Hernández-Orallo, J., Dowe, D.L., Hernández-Lloreda, M.V.: Universal psychometrics: Measuring cognitive abilities in the machine kingdom. Cognitive Systems Research 27, 50–74 (2014)Hernández-Orallo, J., Minaya-Collado, N.: A formal definition of intelligence based on an intensional variant of Kolmogorov complexity. In: Proc. Intl. Symposium of Engineering of Intelligent Systems (EIS 1998), pp. 146–163. ICSC Press (1998)Hibbard, B.: Bias and no free lunch in formal measures of intelligence. Journal of Artificial General Intelligence 1(1), 54–61 (2009)Legg, S., Hutter, M.: Universal intelligence: A definition of machine intelligence. Minds and Machines 17(4), 391–444 (2007)Li, M., Vitányi, P.: An introduction to Kolmogorov complexity and its applications, 3 edn. Springer-Verlag (2008)Schaul, T.: An extensible description language for video games. IEEE Transactions on Computational Intelligence and AI in Games PP(99), 1–1 (2014)Solomonoff, R.J.: A formal theory of inductive inference. Part I. Information and control 7(1), 1–22 (1964
Deep Reinforcement Learning: An Overview
In recent years, a specific machine learning method called deep learning has
gained huge attraction, as it has obtained astonishing results in broad
applications such as pattern recognition, speech recognition, computer vision,
and natural language processing. Recent research has also been shown that deep
learning techniques can be combined with reinforcement learning methods to
learn useful representations for the problems with high dimensional raw data
input. This chapter reviews the recent advances in deep reinforcement learning
with a focus on the most used deep architectures such as autoencoders,
convolutional neural networks and recurrent neural networks which have
successfully been come together with the reinforcement learning framework.Comment: Proceedings of SAI Intelligent Systems Conference (IntelliSys) 201
A Cordial Sync: Going Beyond Marginal Policies for Multi-Agent Embodied Tasks
Autonomous agents must learn to collaborate. It is not scalable to develop a
new centralized agent every time a task's difficulty outpaces a single agent's
abilities. While multi-agent collaboration research has flourished in
gridworld-like environments, relatively little work has considered visually
rich domains. Addressing this, we introduce the novel task FurnMove in which
agents work together to move a piece of furniture through a living room to a
goal. Unlike existing tasks, FurnMove requires agents to coordinate at every
timestep. We identify two challenges when training agents to complete FurnMove:
existing decentralized action sampling procedures do not permit expressive
joint action policies and, in tasks requiring close coordination, the number of
failed actions dominates successful actions. To confront these challenges we
introduce SYNC-policies (synchronize your actions coherently) and CORDIAL
(coordination loss). Using SYNC-policies and CORDIAL, our agents achieve a 58%
completion rate on FurnMove, an impressive absolute gain of 25 percentage
points over competitive decentralized baselines. Our dataset, code, and
pretrained models are available at https://unnat.github.io/cordial-sync .Comment: Accepted to ECCV 2020 (spotlight); Project page:
https://unnat.github.io/cordial-syn
Online Continual Learning on Sequences
Online continual learning (OCL) refers to the ability of a system to learn
over time from a continuous stream of data without having to revisit previously
encountered training samples. Learning continually in a single data pass is
crucial for agents and robots operating in changing environments and required
to acquire, fine-tune, and transfer increasingly complex representations from
non-i.i.d. input distributions. Machine learning models that address OCL must
alleviate \textit{catastrophic forgetting} in which hidden representations are
disrupted or completely overwritten when learning from streams of novel input.
In this chapter, we summarize and discuss recent deep learning models that
address OCL on sequential input through the use (and combination) of synaptic
regularization, structural plasticity, and experience replay. Different
implementations of replay have been proposed that alleviate catastrophic
forgetting in connectionists architectures via the re-occurrence of (latent
representations of) input sequences and that functionally resemble mechanisms
of hippocampal replay in the mammalian brain. Empirical evidence shows that
architectures endowed with experience replay typically outperform architectures
without in (online) incremental learning tasks.Comment: L. Oneto et al. (eds.), Recent Trends in Learning From Data, Studies
in Computational Intelligence 89
Increasing generality in machine learning through procedural content generation
Procedural Content Generation (PCG) refers to the practice, in videogames and
other games, of generating content such as levels, quests, or characters
algorithmically. Motivated by the need to make games replayable, as well as to
reduce authoring burden, limit storage space requirements, and enable
particular aesthetics, a large number of PCG methods have been devised by game
developers. Additionally, researchers have explored adapting methods from
machine learning, optimization, and constraint solving to PCG problems. Games
have been widely used in AI research since the inception of the field, and in
recent years have been used to develop and benchmark new machine learning
algorithms. Through this practice, it has become more apparent that these
algorithms are susceptible to overfitting. Often, an algorithm will not learn a
general policy, but instead a policy that will only work for a particular
version of a particular task with particular initial parameters. In response,
researchers have begun exploring randomization of problem parameters to
counteract such overfitting and to allow trained policies to more easily
transfer from one environment to another, such as from a simulated robot to a
robot in the real world. Here we review the large amount of existing work on
PCG, which we believe has an important role to play in increasing the
generality of machine learning methods. The main goal here is to present RL/AI
with new tools from the PCG toolbox, and its secondary goal is to explain to
game developers and researchers a way in which their work is relevant to AI
research
Trappin-2/Elafin Modulate Innate Immune Responses of Human Endometrial Epithelial Cells to PolyI∶C
BACKGROUND: Upon viral recognition, innate and adaptive antiviral immune responses are initiated by genital epithelial cells (ECs) to eradicate or contain viral infection. Such responses, however, are often accompanied by inflammation that contributes to acquisition and progression of sexually transmitted infections (STIs). Hence, interventions/factors enhancing antiviral protection while reducing inflammation may prove beneficial in controlling the spread of STIs. Serine antiprotease trappin-2 (Tr) and its cleaved form, elafin (E), are alarm antimicrobials secreted by multiple cells, including genital epithelia. METHODOLOGY AND PRINCIPAL FINDINGS: We investigated whether and how each Tr and E (Tr/E) contribute to antiviral defenses against a synthetic mimic of viral dsRNA, polyinosine-polycytidylic acid (polyI:C) and vesicular stomatitis virus. We show that delivery of a replication-deficient adenovector expressing Tr gene (Ad/Tr) to human endometrial epithelial cells, HEC-1A, resulted in secretion of functional Tr, whereas both Tr/E were detected in response to polyI:C. Moreover, Tr/E were found to significantly reduce viral replication by either acting directly on virus or through enhancing polyI:C-driven antiviral protection. The latter was associated with reduced levels of pro-inflammatory factors IL-8, IL-6, TNFα, lowered expression of RIG-I, MDA5 and attenuated NF-κB activation. Interestingly, enhanced polyI:C-driven antiviral protection of HEC-Ad/Tr cells was partially mediated through IRF3 activation, but not associated with higher induction of IFNβ, suggesting multiple antiviral mechanisms of Tr/E and the involvement of alternative factors or pathways. CONCLUSIONS AND SIGNIFICANCE: This is the first evidence of both Tr/E altering viral binding/entry, innate recognition and mounting of antiviral and inflammatory responses in genital ECs that could have significant implications for homeostasis of the female genital tract